巴西专利BR112019018689A2 inter-prediction refinement based on bi-directional (bio) optical flow

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
a video decoder can be configured to determine that a block of video data is encoded using a bidirectional inter-prediction mode; determining that the video data block is encoded using a bi-directional (bio) optical flow process; inter-predict the video data block according to the bidirectional inter-prediction mode; running the bio process for the block, wherein running the bio process for the block comprises determining a single motion vector refinement for a group of pixels in the block, where the group of pixels comprises at least two pixels; refine the pixel group based on single motion vector refinement; and transmitting a refined bio predictive block of video data comprising the refined group of pixels.
公开号:BR112019018689A2
申请号:R112019018689
申请日:2018-03-13
公开日:2020-04-07
发明作者:Chuang Hsiao-Chiang；Chen Jianle；Karczewicz Marta；Chien Wei-Jung；Li Xiang；Chen Yi-Wen
申请人:Qualcomm Inc；
IPC主号:

专利说明:

REFINING INTER-PREDICTION BASED ON BIDIRECTIONAL OPTICAL FLOW (BIO) [01] This application claims the benefit of provisional US patent application no. 62 / 470,809, deposited on March 13, 2017, which is hereby incorporated by reference in full.
TECHNICAL FIELD [02] The present invention relates to video encoding and decoding.
BACKGROUND [03] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, e-readers -book, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, satellite or mobile radio phones, so-called smart phones, video teleconferencing devices, video streaming devices and the like. Digital video devices implement video encoding techniques, such as those described in the standards defined by MPEG-2, MEPG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, part 10, Advanced video encoding video (AVC), ITU-T H.265 / High efficiency video encoding (HEVC) and extensions of such standards. Video devices can transmit, receive, encode, decode and / or store digital video information more efficiently by implementing such video encoding techniques.
Petition 870190089120, of 09/09/2019, p. 8/120
2/75 [04] Video encoding techniques include spatial prediction (intra-image) and / or temporal prediction (inter-image) to reduce or remove redundancy inherent in video sequences. For block-based video encoding, a video slice (i.e., a video frame or a portion of a video frame) can be divided into video blocks. Video blocks in an intra-encoded slice (I) of an image can be encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. Video blocks in an inter-encoded slice (P or B) of an image can use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. Images can be mentioned as frames and reference images can be mentioned as frames of reference.
[05] Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be encoded and the predictive block. An intercoded block is coded according to a motion vector that points to a reference sample block forming the predictive block and the residual data indicate the difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and residual data. For additional compression, residual data can be transformed from a pixel domain to a transform domain, resulting in transform coefficients
Petition 870190089120, of 09/09/2019, p. 9/120
3/75 residuals, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional arrangement, can be digitized to produce a one-dimensional vector of transform coefficients and entropy coding can be applied to obtain even more compression.
SUMMARY [06] In general, techniques of this disclosure are related to improvements in bidirectional optical flow (BIO) video encoding techniques used in combination with bidirectional inter prediction.
[07] According to an example, a method of decoding video data includes determining that a block of video data is encoded using a bidirectional inter-prediction mode; determine that the video data block is encoded using a bidirectional optical flow (BIO) process, inter predict the video data block according to the bidirectional inter prediction mode, perform the BIO process for the block, where the Executing the BIO process for the block comprises determining a single motion vector refinement for a group of pixels in the block and refining the pixel group based on the single motion vector refinement, where the pixel group comprises at least two pixels , and transmitting a refined predictive block BIO of video data comprising the refined group of pixels.
[08] In another example, a device for decoding video data includes a memory configured to store video data; and one or more processors configured to determine that a block of
Petition 870190089120, of 09/09/2019, p. 10/120
4/75 video data is encoded using a bidirectional inter prediction mode; determine that the video data block is encoded using a bidirectional optical flow (BIO) process, inter predict the video data block according to the bidirectional inter prediction mode, perform the BIO process for the block, where for perform the BIO process for the block, one or more processors are configured to determine a single motion vector refinement for a group of pixels in the block, where the pixel group comprises at least two pixels and refine the pixel group based on in the refinement of single motion vector; and transmitting a refined predictive block BIO of video data comprising the refined group of pixels.
[09] In another example, an apparatus for decoding video data includes means for determining that a block of video data is encoded using a bidirectional inter-prediction mode; means for determining that the video data block is encoded using a bidirectional optical flow (BIO) process, inter predicting the video data block is encoded using a bidirectional optical flow (BIO) process, means for inter predicting the block video data according to the bidirectional inter prediction mode; means for performing the BIO process for the block, wherein the means for performing the BIO process for the block comprises means for determining a single motion vector refinement for a group of pixels in the block and means for refining the group of pixels based on in single motion vector refinement, where the pixel group comprises at least two pixels, and a half for
Petition 870190089120, of 09/09/2019, p. 11/120
5/75 transmitting a refined BIO predictive block of video data comprising the refined group of pixels.
[010] In another example, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause one or more processors to determine that a block of video data is encoded using a bidirectional inter-prediction mode; determine that the video data block be encoded using a bidirectional optical flow (BIO) process; inter provide the video data block according to the bidirectional inter prediction mode; execute the BIO process for the block, in which to execute the BIO process for the block, the instructions cause one or more processors to determine a single motion vector refinement for a group of pixels in the block and refine the group of pixels with based on single motion vector refinement, where the pixel group comprises at least two pixels, and transmit a refined predictive block
Data BIO of video understanding the group refined in pixels. [011 ] The details of a or more examples gives revelation are exposed in the drawings in attachment is in the description
below. Other characteristics, objectives and advantages will be evident from the description, drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS [012] Figure 1 is a block diagram illustrating an example video encoding and decoding system that can use techniques for bidirectional optical flow.
[013] Figure 2 is a conceptual diagram
Petition 870190089120, of 09/09/2019, p. 12/120
6/75 illustrating an example of unilateral motion estimation (ME) as a block matching algorithm (BMA) performed for lower speed conversion of compensated motion frame (MC-FRUC).
[014] Figure 3 is a conceptual diagram illustrating an example of bilateral ME as a BMA performed for MC-FRUC.
[015] Figure 4A shows neighboring spatial MV candidates for fusion mode.
[016] Figure 4B shows neighboring spatial MV candidates for AMVP modes.
[017] Figure 5A shows an example of a TMVP candidate.
[018] Figure 5B shows an example of MV scheduling.
[019] Figure 6 shows an example of an optical flow path.
[020] Figure 7 shows an example of BIO for an 8x4 block.
[021] Figure 8 shows an example of BIO modified for an 8x4 block.
[022] Figures 9A and 9B show examples of sub-blocks where OBMC applies.
[023] Figures 10A-10D show examples of OBMC weights.
[024] Figure 11 shows an example of the overall MC process in JEM 5.
[025] Figures 12A-12D show examples of weighting functions.
[026] Figure 13 shows an example of BIO
Petition 870190089120, of 09/09/2019, p. 13/120
7/75
derivative according with techniques of that revelation. [027] THE figure 14 shows an example in BIO derivative according with techniques of that revelation. [028] THE figure 15 shows an example in BIO derivative according with techniques of that revelation. [029] THE figure 16 shows an example in BIO derivative according with techniques of that revelation. [030] THE figure 17 shows an example in BIO derivative according with techniques of that revelation. [031] THE figure 18 shows an example in BIO derivative according with techniques of that revelation. [032] THE figure 19 is a diagram of blocks illustrating an example of a code video recorder. [033] THE figure 20 is a diagram of blocks
illustrating an example of a video decoder that can implement techniques for bidirectional optical flow.
[034] Figure 21 is a flow chart illustrating an example method of decoding video data according to techniques described in this disclosure.
DETAILED DESCRIPTION [035] In general, the techniques of this disclosure are related to improvements in bidirectional optical flow (BIO) video encoding techniques. More specifically, the techniques of this disclosure are related to the inter prediction and reconstruction of the BIO motion vector for video coding and the refinement of inter prediction based on the BIO. BIO can be applied during motion compensation. In general, BIO is used to modify a motion vector on a per pixel basis (for example, per sample) for a current block, so that
Petition 870190089120, of 09/09/2019, p. 14/120
8/75 pixels of the current block are predicted using corresponding offset values applied to the predictive block. BIO has the effect of creating a new motion vector, however in the effective implementation of ΒΙΟ, the predictive block is modified by adding displacements while the motion vector itself is not actually modified.
[036] The techniques of this disclosure can be applied to any existing video codec, such as those that conform to ITU-T H.264 / AVC (Advanced Video Coding) or High Efficiency Video Coding (HEVC), also mentioned such as ITU-T H.265. H.264 is described in the International Telecommunication Union, Advanced video coding for generic audiovisual services, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - coding of moving video, H.264, June 2011 and H.265 is described in International Telecommunication Union, High efficiency video coding, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - coding of moving video, April 2015. The techniques of this reveal can also be applied to any other previous or future video encoding standards as a efficient coding tool.
[037] An overview of HEVC is described in G.J. Sullivan, J.-R. Ohm, W.-J. Han, T. Wiegand Overview of the high efficiency video coding (HEVC) Standard, IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, p. 1649-1668, December 2012. The latest HEVC draft specification is available at http: //phenix.intPetition 870190089120, 09/09/2019, pg. 15/120
9/75 evry.fr/jct/doc end user / documents / 14) Vienna / wgll / JCTVCN1003-V1. zip. The most recent version of the final draft of the HEVC International Standard (FDIS) is described in JCTVCL1003_v34, available at http: // phenix. itsudaparis.eu/jet/doc end user / documents / 12 Geneva / wgll / JCTV C-L1003-v34.zip.
[038] Other video encoding standards include ITU-T H.261, visual ISO / IEC MPEG-1, ITU-T H.262 or visual ISO / IEC MPEG-2, ITU-T H.263, ISO / IEC Visual MPEG-4 and the scalable Video Encoding (SVC) and Multi-view Video Encoding (MVC) extensions of H.264, as well as HEVC extensions, such as the range extension, multi-view extension (MV- HEVC) and scalable extension (SHVC). In April 2015, Video coding Experts Group (VCEG) started a new research project that is aimed at a next generation of video coding standards. The reference software is called HM-KTA.
[039] ITU-T VCEG (Q6 / 16) and ISO / IE MPEG (JTC 1 / SC 29 / WG 11) are now studying the potential need for standardization of future video encoding technology with a compression capacity that significantly exceeds that of the current HEVC standard (including your current extensions and near future extensions for encoding screen content and encoding high dynamic range). The groups are working together in this exploration activity in a joint collaborative effort known as the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by their experts in this field. 0 JVET if
Petition 870190089120, of 09/09/2019, p. 16/120
10/75 first encountered during October 19-21, 2015. A description of the Joint Exploration Test Model (JEM) algorithm is described in JWET-E1001. A reference software version, that is, Joint Exploration Model 5 (JEM 5), J. Chen, E. Alshina, G.J. Sullivan, J.-R. Ohm, J. Boyce, Algorithm Description of Joint exploration test model 5 ', JVET-E1001, January 2017, can be downloaded from: https: // j vet.hhi.fraunhofer. from / svn / svn HMJEMSoftware / tags / H -16.6-JEM-5.0.1 /. Another description of the JEM number is described in JVET-E1001. The latest version of reference software, that is, Joint Exploration Model 7 (JEM 7), J. Chen, E. Alshina, G.J. Sullivan, J.-R. Ohm, J. Boyce, Algorithm description of joint exploration test model 5, JVET-G1001, January 2017, can be downloaded from: https: // j vet.hhi.fraunhofer. of / svn / svn HMJEMSoftware / tags / H M-16.6-HEM-7.0 /.
[040] Certain video encoding techniques, such as those of H.264e HEVC that are related to the techniques of this development, are described below. Certain techniques of this disclosure can be described with reference to H.264 and / or HEVC to aid understanding, but the techniques described are not necessarily limited to H.264 or HEVC and can be used in combination with other coding standards and others coding tools.
[041] The following discussion concerns movement information. In general, an image is divided into blocks, each of which can be predictively encoded. The prediction of a current block can generally be performed using intra-prediction techniques (using data from the image including the current block) or
Petition 870190089120, of 09/09/2019, p. 17/120
11/75 inter-prediction (using data from an image previously encoded in relation to the image including the current block). Inter-prediction includes both unidirectional and bidirectional prediction.
[042] For each inter-predicted block, a set of movement information can be available. A set of motion information can contain motion information for forward and backward prediction directions. Here, forward and backward prediction directions are two directions of prediction in a bidirectional prediction mode, and forward and backward terms do not necessarily have a geometry meaning. Instead, forward and backward terms generally correspond to whether reference images should be displayed before (backward) or after (forward) the current image. In some examples, forward and backward prediction directions may correspond to reference image list 0 (RefPicListO) and reference image list 1 (RefPicList1) of a current image. When only a reference image list is available for an image or slice, only RefPicListO is available and
the information in movement of each block of a slice if always refer The an image of RefPicList (for example, is forward).[043 ] For each prediction direction, The information movement contains a benchmark and one motion vector. In some cases for simplicity, one
The motion vector itself can be mentioned in a way that the motion vector is assumed to have an associated reference index. A benchmark can be
Petition 870190089120, of 09/09/2019, p. 18/120
12/75 used to identify a reference image in the current reference image list (RefPicListO or RefPicListl). A motion vector has a horizontal (x) and a vertical (y) component. In general, the horizontal component indicates a horizontal shift in a reference image, relative to the position of a current block in a current image, needed to locate an x-coordinate of a reference block, while the vertical component indicates a vertical shift in the reference image, in relation to the position of the current block, needed to locate a y-coordinate of the reference block.
[044]
Image order count (POC) values are widely used in video coding standards to identify an image display order. Although there are cases where two images in an encoded video sequence can have the same POC value, this typically does not occur in an encoded video sequence. In this way, POC values of images are generally unique, and thus can uniquely identify corresponding images. When multiple sequences of encoded video are present in a bit stream, images having the same POC value may be closer together in terms of decoding order. Image POC values are typically used for building reference image lists, deriving reference image sets as in HEVC, and motion vector scaling.
[045]
E. Alshina,
A. Alshin, J.-H. Min, K.
Choi, A. Saxena, M. Budagavi, Known tools performance investigation for next generation video coding, ITU
Petition 870190089120, of 09/09/2019, p. 19/120
13/75
Telecommunications standardization sector, STUDY GROUP 16 Question 6, Video Coding experts Group (VCEG), VCEG-AZ05, June 2015, Warsaw, Poland (hereinafter, Alshina 1) and A. Alshina, E. Alshina, T. Lee, Bi -directional optical flow for improving motion compensation, Picture Coding Symposium (PCS), Nagoya, Japan, 2010 (hereafter, Alshina 2) described a method called bidirectional optical flow (BIO). It is based on pixel level optical flow. According to Alshina 1 and Alshina 2, BIO is only applied to blocks that have both forward and backward prediction. BIO as described in Alshina 1 and Alshina 2 is summarized below:
[046]
Given a pixel value (for example, a luma sample or a chroma sample) I _t at time t, your first order Taylor expansion is [047]
Ito is on the movement path of
It · That is, the movement from I _t to I _t is considered in the formula.
[048]
Under the assumption of optical flow:
dl dl ^ dl dx dl dy dt 5t dx dt dy dt dl ÕI dx dl dy ãt ”~ 'dxdt ~ dydt dl„ dl
-. 6., - dx 'dy leave (gradient), and equation (A) becomes
Petition 870190089120, of 09/09/2019, p. 20/120
14/75 [049]
In relation to movement
V _and o can be the same.
3x dy and how the speed used to represent the
Thus, equation (B) becomes k - 4th “foo · ^v x0 '(í“ to) ”foo · V _y0 (t - t ₀ ) (C) [051]
Suppose, as an example advanced in to and a back reference in ti t ₀ - t - t - fo - At - 1 [052]
This leads to:
”4th G _X Q · fo () (tt _Q ) GyQ · VyQ (tt ₀ ) - 4th + foo - 41 ~ was 'was (t” Cl) ~ was · was' (t ”4.) - 41” was 'was r _ / fo + tí | (AivAo ^- Al Vii.) + (Ao '/ yo ^- Ai / pi) t ~ —z - 1 - z -------------- ^r 2 2 xo • foo> 0 foo (D)
Vyl
It is additionally assumed
V _y since the movement is along the trajectory. Thus, equation (D) becomes, __ I to.YO ”_ 4o ^{+ f} t1, AGAA + AGyVy ~ ₂ (E)
Where
AG _:
AG _y
Gy ₀ can be calculated based on reconstructed references.
Since regular bi-prediction is called displacement of
BIO to follow, for convenience.
[054]
V _x and V _y are derived in the encoder and decoder by minimizing the following distortion:
Petition 870190089120, of 09/09/2019, p. 21/120
15/75 min 'diocfc - min' block [055] With V _x and V _y derivatives, the final prediction of the block is calculated with (E) V _x and V _y is called BIO movement for convenience.
[056] In general, a video encoder performs BIO during motion compensation. That is, after the video encoder determines a motion vector for a current block, the video encoder produces a predicted block for the current block using motion compensation with respect to the motion vector. In general, the motion vector identifies the location of a reference block in relation to the current block in a reference image. When executing BIO, a video encoder modifies the motion vector on a per pixel basis for the current block. That is, instead of retrieving each pixel of the reference block as a block unit, according to ΒΙΟ, the video encoder determines modifications per pixel in the motion vector for the current block and constructs the reference block so that the reference block includes pixels identified by the motion vector and the pixel modification for the corresponding pixel in the current block. In this way, BIO can be used to produce a more accurate reference block for the current block.
[057] Figure 1 is a block diagram illustrating an example 10 video encoding and decoding system that can use techniques for streaming
Petition 870190089120, of 09/09/2019, p. 22/120
16/75 bidirectional optical. As shown in figure 1, system 10 includes a source device 12 that provides that encoded video data is decoded at a later time by a destination device 14. In particular, source device 12 provides the video data to the target device 14 via computer-readable media 16. source device 12 and target device 14 can be any of a wide range of devices, including desktop computers, notebook computers (ie, laptop), tablet computers , set-top boxes, telephone sets such as so-called smart phones, so-called smart pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device 12 and the destination device 14 may be equipped for wireless communication.
[058] Target device 14 can receive encoded video data to be decoded via computer-readable media 16. Computer-readable media 16 can be any type of media and / or device capable of moving encoded video data to from source device 12 to target device 14. In one example, computer-readable media 16 can be a communication medium that enables source device 12 to transmit encoded video data directly to target device 14 at real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the
Petition 870190089120, of 09/09/2019, p. 23/120
17/75 destination device 14. The communication medium can be any wired or wireless communication medium (or combination thereof), with a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can be part of a packet-based network, such as a local area network, a remote area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device 12 to the destination device 14.
[059] In some examples, encrypted data can be transmitted from output interface 22 to a storage device. Similarly, encrypted data can be accessed from the storage device via the input interface. The storage device may include any of a variety of data storage media locally accessed or distributed such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other media. digital storage suitable for storing encoded video data. In an additional example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video data generated by the source device 12. The destination device 14 can access the video data stored at from the storage device via streaming or downloading. The file server can be any type of server capable of storing video data
Petition 870190089120, of 09/09/2019, p. 24/120
18/75 encrypted and transmit the encoded video data to the target device 14. Sample file servers include a network server (for example, for a website), an FTP server, network-attached storage devices (NAS) , or a local disk drive. The target device 14 can access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device can be a streaming transmission, a download transmission or a combination of both.
[060] The techniques of this disclosure are not limited to wireless applications or scenarios. The techniques can be applied to video encoding in support of any of a variety of multimedia applications, such as air television broadcasts, cable television broadcasts, satellite television broadcasts, Internet streaming video broadcasts, as adaptive streaming dynamic through HTTP (DASH), encoding video data for storage on digital video media that is encoded on data storage media, decoding digital video stored on data storage media or other applications. In some examples, system 10 can be configured to support simple video transmission or
Petition 870190089120, of 09/09/2019, p. 25/120
19/75 duplex to support applications such as video streaming, video playback, video broadcasting and / or video telephony.
[061] In the example in Figure 1, source device 12 includes video source 18, video encoder 20 and output interface 22. Target device 14 includes input interface 28, video decoder 30 and display device 32 According to this disclosure, the video encoder 20 of the source device 12 can be configured to apply techniques for bidirectional optical flow. In other examples, a source device and a target device may include other components or arrangements. For example, the source device 12 can receive video data from an external video source 18, such as an external camera. Similarly, the target device 14 can interface with an external display device, instead of including an integrated display device.
[062] System 10 illustrated in figure 1 is merely an example. Techniques for bidirectional optical flow can be performed by any digital video encoding and / or decoding device. While in general the techniques for this disclosure are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. In addition, the techniques of this development can also be performed by a video processor. The source device 12 and the destination device 14 are merely examples of such encoding devices in which the
Petition 870190089120, of 09/09/2019, p. 26/120
20/75 source 12 generates encoded video data for transmission to the target device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical mode so that each of the devices 12, 14 includes encoding and decoding components of video. Consequently, system 10 can support single or duplex transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.
[063] The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video file containing previously captured video data, and / or a video feed interface for receiving video from a video content provider. As an additional alternative, video source 18 can generate data based on computer graphics as the video source or a combination of live video, archived video, and computer generated video. In some cases, if the video source 18 is a video camera, the source device 12 and the target device 14 can form so-called camera headsets or videophones. As mentioned above, however, the techniques described in this disclosure may be applicable to video encoding in general and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder 20. The encoded video information can then be transmitted via the output interface 22 onto computer-readable media 16.
Petition 870190089120, of 09/09/2019, p. 27/120
21/75 [064] Computer-readable media 16 may include transient media, such as a wireless broadcast or wired network transmission, or storage media (that is, non-transient storage media) such as a hard disk, flash drive, compact disc, digital video disc, Blu-ray disc, or other computer-readable media. In some examples, a network server (not shown) can receive encoded video data from the source device 12 and provide the encoded video data to the destination device 14, for example, via network transmission. Similarly, a computing device in a media production facility, such as a disc embossing facility, can receive encoded video data from source device 12 and produce a disc containing the encoded video data. Therefore, computer-readable media 16 can be understood to include one or more computer-readable media in various ways, in various examples.
[065] Input interface 28 of destination device 14 receives information from computer-readable media 16. Information from computer-readable media 16 can include syntax information defined by video encoder 20, which is also used by video decoder 30, which includes elements of syntax that describe characteristics and / or processing of video data. The display device 32 displays the decoded video data for a user and can be any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, a diode display
Petition 870190089120, of 09/09/2019, p. 28/120
22/75 organic light emission (OLED) or other type of display device.
[066] Video encoder 20 and video decoder 30 can operate according to a video encoding standard, such as the High Efficiency Video Coding (HEVC) standard, also referred to as ITU-T H.265. In some examples, video encoder 20 and video decoder 30 may operate in accordance with other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, part 10, Advanced Video coding (AVC0, or extensions of such standards. The techniques of this disclosure, however, are not limited to any particular coding standard. Other examples of this disclosure, however, are not limited to any specific coding standard. Other examples of video include MPEG-2 and ITU-T H.263. Although not shown in figure 1, in some respects, video encoder 20 and video decoder 30 can be individually integrated with an audio encoder and decoder, and may include units Appropriate MUX-DEMUX, or other hardware and software, to handle both audio and video encoding in a common data stream or separate data streams. MUX-DEMUX units can conform to the ITU H.223 multiplexor protocol, or other protocols such as the user datagram protocol (UDP).
[067] In HEVC and other video encoding specifications, a video sequence typically includes a series of images. Images can also be
Petition 870190089120, of 09/09/2019, p. 29/120
23/75 mentioned as tables. An image can include three sample sets, indicated S _L , S _{C and} S _Cr · S _L is a two-dimensional set (for example, a block) of luma samples. S _C b is a two-dimensional set of Cb chrominance samples. SCr is a bidirectional set of Cr chrominance samples. Chrominance samples can also be mentioned here as chroma samples. In other instances, an image can be monochrome and can only include a set of luma samples.
[068] To generate an encoded representation of an image, video encoder 20 can generate a set of encoding tree units (CTUs). Each of the CTUs can include a luma sample coding tree block, two corresponding chroma sample code tree blocks, and syntax structures used to encode the samples of the coding tree blocks. In monochrome images or images having three separate color planes, a CTU can include a single coding tree block and syntax structures used to encode the samples in the coding tree block. A coding tree block can be an NxN block of samples. A CTU can also be referred to as a tree block or a larger coding unit (LCU). HEVC CTUs can be broadly analogous to macroblocks from other standards, such as H.264 / AVC. However, a CTU is not necessarily limited to a specific size and can include one or more encoding units (CUs). A slice can include an integer number of CTUs sorted consecutively in the raster scan.
Petition 870190089120, of 09/09/2019, p. 30/120
24/75 [069] A CTB contains a quad-tree whose nodes are coding units. The size of a CTB can be ranges from 16x16 to 64x64 in the main HEVC profile (although technically CTB sizes 8x8 can be supported). A coding unit (CU) can be the same size as a CTB though and as small as 8x8. Each coding unit is coded with a mode. When a CU is inter-coded, the CU can be further divided into 2 or 4 prediction units (PUs) or become just a PU when additional division does not apply. When two PUs are present in a CU, the PUs can be half size rectangles or two rectangles with sizes and the size of the CU.
[070] To generate an encoded CTU, video encoder 20 can recursively perform quad-tree division on the encoding tree blocks of a CTU to divide the encoding tree blocks into encoding blocks, hence the name tree units coding. A coding block can be an NxN block of samples. A CU can include a luma sample coding block and two corresponding chroma sample coding blocks of an image that has a luma sample set, a Cb sample set and a Cr sample set, and structures of syntax used to encode the samples of the encoding blocks. In monochrome images or images having three separate color planes, a CU can include a single coding block and syntax structures used to encode the samples in the coding block.
[071] The video encoder 20 can divide
Petition 870190089120, of 09/09/2019, p. 1/31
25/75 a CU coding block in one or more prediction blocks. A prediction block is a rectangular block (that is, square or non-square) of samples to which the same prediction is applied. A CU prediction unit (PU) may include a luma sample prediction block, two corresponding chroma sample prediction blocks and synthase structures used to predict the prediction blocks. In monochrome images or images having three separate color planes, a PU can include a single prediction block and syntax structures used to predict the prediction block. The video encoder 20 can generate predictive luma, Cb, and Cr blocks for luma, Cb, and Cr prediction blocks for each PU of the CU.
[072] Video encoder 20 can use intra prediction or Inter prediction to generate the predictive blocks for a PU. If the video encoder 20 uses intra prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of the image associated with the PU. If the video encoder 20 uses inter-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of one or more images other than the image associated with the PU. When the CU is inter-coded, a set of movement information can be present for each PU. In addition, each PU can be coded with a unique inter-prediction mode to derive the set of motion information.
[073] After video encoder 20 generates predictive blocks luma, Cb and Cr for one or more PUs in a
Petition 870190089120, of 09/09/2019, p. 32/120
26/75
CU, video encoder 20 can generate a residual block Cb for CU. Each sample in the residual block Cb of CU can indicate a difference between a sample Cb in one of the predictive blocks Cb of CU and a corresponding sample in the original coding block Cb of CU. The video encoder 20
can also generate a residual block Cr for the CU. Each sample in the CU residual block Cr can indicate an difference between a Cr sample in one of the blocks Cr predictive of CU and a corresponding sample on the block in CU's original Cr coding. [074] 0 encoder in video 20 can use quad-tree division to decompose the residual blocks in luma, Cb and Cr of a CU in one or more blocks in
transformed from luma, Cb and Cr. A transform block is a rectangular block (for example, square or non-square) of samples to which the same transform is applied. A transform unit (TU) of a CU can include a luma sample transform block, two corresponding chroma sample transform blocks, and syntax structures used to transform the transform block samples. In this way, each CU of a TU can be associated with a luma transform block, a Cb transform block, and a Cr transform block. The TU associated luma transform block can be a sub = block of the CU residual luma block. The Cb transform block can be a sub-block of the residual Cb block of CU. The Cr transform block can be a subblock of the residual Cr block of CU. In monochrome images or images having three separate color planes, a TU can include a single transform block and
Petition 870190089120, of 09/09/2019, p. 33/120
27/75 syntax structures used to transform samples from the transform block.
[075] The video encoder 20 can apply one or more transforms to a TU's luma transform block to generate a lum coefficient block for a TU. A coefficient block can be a two-dimensional set of transform coefficients. A transform coefficient can be a scalar quantity. The video encoder 20 can apply one or more transforms to a cU transform block of a TU to generate a cB coefficient block for the TU. The video encoder 20 can apply one or more transforms to a CR transform block of a TU to generate a CR coefficient block for the TU.
[076] After generating a coefficient block (for example, a luma coefficient block, a Cb coefficient block or a Cr coefficient block), the video encoder 20 can quantize the coefficient block. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the transform coefficients, providing additional compression. After video encoder 20 quantizes a coefficient block, video encoder 20 can entropy encode syntax elements indicating the quantized transform coefficients. .
[077] The video encoder 20 can transmit a bit stream that includes a bit stream that forms a representation of encoded images and associated data. The bit stream can include a
Petition 870190089120, of 09/09/2019, p. 34/120
28/75 sequence of NAL units. An NAL unit is a syntax structure containing an indication of the type of data in the NAL unit and bytes containing that data in the form of an RBSP interspersed as necessary with emulation prevention bits. Each of the NAL units includes an NAL unit header and encapsulates an RBSP. The NAL unit header may include a syntax element indicating an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. An RBSP can be a syntax structure containing an integer number of bytes that is encapsulated in an NAL unit. In some instances, an RBSP includes zero bits.
[078] Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit can encapsulate an RBSP to a PPS, a second type of NAL unit can encapsulate an RBSP to an encoded slice, a third type of NAL unit can encapsulate an RBSP for SEI messages, etc. NAL units that encapsulate RBSPs for video encoding data (as opposed to RBSPs for parameter sets and SEI messages) can be referred to as NAL VCL units.
[079] The video decoder 30 can receive a bit stream generated by the video encoder 20. Furthermore, the video decoder 30 can analyze the bit stream to obtain syntax elements from the bit stream. The video decoder 30 can reconstruct the images of the video data based at least in part on the syntax elements obtained from the bit stream. O
Petition 870190089120, of 09/09/2019, p. 35/120
The process for reconstructing the video data can in general be reciprocal to the process performed by the video encoder 20. Furthermore, the video decoder 30 can reverse quantize coefficient blocks associated with TUs of a current CU. The video decoder 30 can perform inverse transforms on the coefficient blocks to reconstruct transform blocks associated with the TUs of the current CU. The video decoder 30 can reconstruct the encoding blocks of the current CU by adding the samples of the predictive blocks for PUs of the current CU to corresponding samples of the transform blocks of the current CU's TUs. By reconstructing the encoding blocks for each CU of an image, the video decoder 30 can reconstruct the image.
[080] According to the techniques of this disclosure, the video encoder 20 and / or video decoder 30 can additionally perform bidirectional optical flow (BIO) techniques during motion compensation as discussed in more detail below.
[081] Video encoder 20 and video decoder 30 each can be implemented as any of a variety of suitable encoder or decoder circuitry, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable port arrangements (FPGAs), discrete logic circuitry, software, hardware, firmware or any combination thereof. Each of the video encoder 20 and video decoder 30 can be included in one or more encoders or
Petition 870190089120, of 09/09/2019, p. 36/120
30/75 decoders, any of which can be integrated as part of a combined video encoder / decoder (CODEC). A device including video encoder 20 and / or video decoder 30 can include an integrated circuit, a microprocessor, and / or a wireless communication device, such as a cell phone.
[082] Figure 2 is a conceptual diagram illustrating an example of a one-sided motion estimation (ME) as a block matching algorithm (BMA) performed for lower speed conversion to compensated motion frame (MC-FRUC). In general, a video encoder (such as video encoder 20 or video decoder 30) performs one-sided ME to obtain motion vectors (MVs), such as MV 112, by seeking the best match block (for example, reference block 108 ) from reference frame 102 to current block 106 of current frame 100. Then, the video encoder interpolates an interpellated block 110 along the motion vector motion path 112 in interpolated frame 104. That is, in the example of 2, the motion vector 112 passes through midpoints of the current block 106, reference block 108 and interpolated block 110.
[083] As shown in figure 2, three blocks in three frames are involved following the movement path. Although the current block 106 in the current frame 100 belongs to an encoded block, the best matching block in reference frame 102 (ie, reference block 108) does not need to belong entirely to an encoded block (that is, the matching block better could not
Petition 870190089120, of 09/09/2019, p. 37/120
31/75 be included in a coded block limit, but instead, it can override that limit). Similarly, interpolated block 110 in interpolated frame 104 does not need to belong entirely to an encoded block. Consequently, overlapping regions of the blocks and unfilled regions (holes) can occur in the interpolated frame 104.
[084] To address overlaps, simple FRUC algorithms merely involve mediation and recording over the overlapping pixels. In addition, holes can be covered by the pixel values of a current or reference frame. However, these algorithms can result in blocking and blurring artifacts. Consequently, motion field segmentation, successive extrapolation using the discrete Hartley transform, and image retouching can be used to address holes and overlaps without increasing blocking and blurring artifacts.
[085] Figure 3 is a conceptual diagram illustrating an example of bilateral ME as a BMA performed for MC-FRUC. Bilateral ME is another solution (in MC-FRUC) that can be used to avoid problems caused by overlaps and holes. A video encoder (such as video encoder 20 and / or video decoder 30) running bilateral ME obtains MVs 132, 134 passing through interpolated block 130 of interpolated frame 124 (which is intermediate to current frame 120 and reference frame 122) using temporal symmetry between the current block 126 of the current frame 120 and reference block 128 of the reference frame 122. As a result, the video encoder does not
Petition 870190089120, of 09/09/2019, p. 38/120
32/75 generates overlaps and holes in the interpolated frame 124. Since it is assumed that the current block 12 6 is a block that the video encoder processes in a certain order, for example, as in the case of video encoding, a sequence of such blocks would cover the entire intermediate image without overlapping. For example, in the case of video encoding, blocks can be processed in the order of decoding. Therefore, such a method may be more appropriate if FRUC ideas can be considered in a video coding framework.
[086] S.-F. Tu, OC Au, Y, Wu, E. Luo and CH Yeun, A novel framework for frame rate up conversion by predictive variable block-size motion estimated optical flow, International Congress on Image Signal Processing (CISP), 2009 describes an estimation of hybrid block level movement and pixel level optical flow method for lower frame rate conversion. You said that the hybrid scene was better than the individual method.
[087] In the HEVC standard, there are two inter-prediction modes, called fusion (with jumping mode considered as a special case of fusion) and advanced motion vector (AMVP) prediction modes respectively for a PU. In AMVP or fusion mode, a motion vector candidate (MV) list is maintained for multiple motion vector predictors. The current vector (s), as well as fusion mode benchmarks, of the current PU are generated by taking a candidate from the MV candidate list.
[088] The MV candidate list contains up to 5 candidates for the merger mode and only two candidates
Petition 870190089120, of 09/09/2019, p. 39/120
33/75 for AMVP mode. A merger candidate can contain a set of motion information, for example, motion vectors corresponding to both the reference image lists (list 0 and list 1) and the reference indices. If a fusion candidate is identified by a fusion index, reference images are used to predict the current blocks, and the associated motion vectors are determined. However, under AMVP mode for each potential prediction direction of any list 0 or list 1, a reference index needs to be explicitly flagged, along with an MV predictor index (MVP) to the MV candidate list once the candidate AMVP contains only one motion vector. In AMVP mode, predicted motion vectors can be further refined.
[089] A merger candidate corresponds to a complete set of motion information while an AMVP candidate contains only one motion vector for a specific prediction reference / direction list and reference index. Candidates for both modes are similarly derived from the same neighboring spatial and temporal blocks.
[090] Figure 4A shows spatial neighboring MV candidates for fusion mode and figure 4B shows spatial neighboring MV candidates for AMVP modes. Spatial MV candidates are derived from the neighboring blocks shown in figures 4A and 4B for a specific PU (PUo), although the methods that generate block candidates differ for fusion and AMVP modes.
[091] In fusion mode, up to four candidates
Petition 870190089120, of 09/09/2019, p. 40/120
34/75
Spatial MVs can be derived with the orders shown in figure 4A with numbers, and the order is as follows: left (0, Al)), above (1, Bl), above right (2, B0), below left (3, AO)) and above left (4, B2), as shown in figure 4A.
[092] In AMVP mode, neighboring blocks are divided into two groups: left group consisting of blocks 0 and 1, and the above group consisting of blocks 2, 3 and 4 as shown in figure 4B. for each group, the potential candidate in a neighboring block referring to the same reference image as the one indicated by the flagged reference index has the highest priority to be chosen to form a final candidate for the group. It is possible that all neighboring blocks do not contain a motion vector pointing to the same reference image. Therefore, if such a candidate cannot be found, the first available candidate will be staggered to form the final candidate, so differences in temporal distance can be compensated.
[093] Figure 5A shows an example of a TMVP candidate and figure 5B shows an example of MV scheduling. The temporal motion vector (TMVP) predictor candidate, if enabled and available, is added to the MV candidate list after the spatial motion vector candidates. The motion vector derivation process for TMVP candidate is the same for both
the modes in Fusion and AMVP; however, the index in reference target for the TMVP candidate in fusion mode is always defined at 0. ) 094] THE primary block location for The
Petition 870190089120, of 09/09/2019, p. 41/120
35/75 TMVP candidate derivation is the lower right block outside the PU placed as shown in figure 5A as a T block to compensate for the polarization for the above and left blocks used to generate neighboring spatial candidates. However, if that block is located outside the current CTB row or movement information is not
available, the block is replaced with a central block gives PU. [095] 0 vector of movement for the candidate TMVP is derived from PU co-located of image co- located, indicated at the slice level. 0 vector in
movement for the co-located PU is called placed MV. Similar to the direct temporal mode in stroke, to derive the motion vector of the TMVP candidate, the co-located MV needs to be scaled to compensate for the differences in temporal distance, as shown in figure 5B.
[096] HEVC also uses motion vector scaling. It is assumed that the value of the motion vector is proportional to the distance of images in the presentation time. A motion vector associates two images, the reference image and the image containing the motion vector (namely, the image it contains). When a motion vector is used to predict the other motion vector, the distance from the image it contains and the reference image is calculated based on the Image Order Count (POC) values.
[097] For a motion vector to be predicted, both its associated containment image and reference image may be different. Therefore, a new distance (based on POC) is calculated, and the vector of
Petition 870190089120, of 09/09/2019, p. 42/120
36/75 movement is staggered based on these two POC distances. For a neighboring spatial candidate, the containment images for the two motion vectors are the same, while the reference images are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for neighboring spatial and temporal candidates.
[098] HEVC also uses candidate generation of artificial motion vector. If a list of motion vector candidates is not complete, artificial motion vector candidates are generated and inserted at the end of the list until the motion vector candidate list has a complete set of candidates. In fusion mode, there are two types of artificial MV candidates: combined candidate derived only for B-slices and zero candidates used only for AMVP if the first type does not provide enough artificial candidates. For each pair of candidates who are already on the candidate list and have necessary motion information, bidirectional combined motion vector candidates are derived by a combination of the motion vector of the first candidate referring to a list of images in list 0 and vector motion of a second candidate referring to an image in list 1.
[099] HEVC also uses a roughing process to insert candidates. Candidates from different blocks can happen to be the same, which decreases the efficiency of an AMVP mode merge / candidate list. A thinning process can be applied to solve this problem. Compare one candidate against others on the current candidate list for
Petition 870190089120, of 09/09/2019, p. 43/120
37/75 avoid insertion of identical candidates at a certain point. To reduce complexity, only limited numbers of wear processes are applied instead of comparing each potential with all others.
[0100] Aspects of bidirectional optical flow in JEM will now be described. Figure 6 shows an example of an optical flow path. BIO uses pixel refinement of motion that is performed on top of motion compensation in the block sense in a bi-prediction case. When fine movement is compensated it can within the block augmentation BIO results in increasing the block size for motion compensation. Sample level motion refinement does not require exhaustive search or signaling since there is an explicit equation that provides fine motion vector for each sample.
[0101] Let l ^(k) be the luminance value from the reference k (k = 0, 1) after compensating block movement, and ' ^z ' being horizontal and vertical components of the l ^(k) gradient respectively. Assuming that the optical flow is valid, the motion vector field (v _x , v _y ) is given by an equation õR / a + v. + r. d / 'Rcy = (). (1) [0102] Combining the optical flow equation for the exclusive polynomial trajectory Hermite sample interpolation function values obtain one that matches both ends.
movement of each third order l ^(k) and derivatives
The value of this polynomial in
Petition 870190089120, of 09/09/2019, p. 44/120
38/75 t = 0 is BIO prediction:
precf _o = 1/2 - (/ ^ + / ¹ ) + vy / 2 (ηδ / ^s / âc - / - // * '/ ôc) + 1j, / 2 (η c7 ¹ ''/ ^ - r03 / °) / Sy)).
(2) [0103] Here, Tq and τ ₂ indicate the distance to the reference frames as shown in a figure 6. Distances τ ₀ and τ ₂ are calculated based on POC for RefO and Refl: To = POC (current)
POC (RefO), Ti = POC (Refl) -POC (current).
If both predictions come from the same time direction (both from the past or both from the future) then signals are different To. Ti <0. In this case, BIO is applied only if the prediction does not come from the same moment of time (το both referenced regions have non-zero motion block motion vectors are proportional to the time distance (à / Fx JMVx j Λ / Ρ> · jMVy [ 0104] The motion vector field (V _x , V _y ) is determined by minimizing the difference Δ between values at points A and B (intersection of motion path and reference frame planes in figure 6). first local Taylor expansion linear term for
Δ:
Δ = 1 / ^Οί -7% + / dx + + dl® / d $ (3) [0105]
All values in (1) depend on the sample location (i ', j') that has been omitted to date.
Assuming that the movement is consistent in
Petition 870190089120, of 09/09/2019, p. 45/120
39/75 local surrounding environment, ο Δ within (2M + 1) χ (2M + 1) square window Ω centered on the currently predicted point (i, j) can be minimized:
(i ’„ v J = argmm [/ ', /] (4) [0106] For this optimization problem, a simplified solution that makes the first minimization vertical and then in horizontal directions can be used, which results in:
v _x ⁼ + r)> nt clip3 —thBlO, thBlO, -; o / 5) ia, - + r)> m clip3 (—thBlO, thBIO, -; 0> ^J (s ₅ + r) 7 (6)
Where,
5 _t = ^ (τ, άΓ ^ / cbc + r _fl = - Ζ ^Ι0> ) (η63Ζ ^ / 5τ -t- ^ oZ ^ ⁰¹ // ^ • A s ₂ ν (ηά / ^{<: ι)} / Ζχ + τ () dl ^ '/ dx ^ õ3 ^l) / dy + r0 õl ^ / qy), [n.jeU
l · d- u 1. | (7) [0107] To avoid division by zero or very small value, smoothing parameters r and m are introduced in equations (2), (3).
r = 500 4 ^d ~ ⁸ (8) m = 700 4 ^d - ⁸ (9)
Here d is the internal bit depth of the input video.
[0108] In some cases, the BIO MV regiment
Petition 870190089120, of 09/09/2019, p. 46/120
40/75 can be unreliable due to noise or irregular movement. Therefore, in BIO, the magnitude of the MV regiment is cut at a certain threshold (thBIO). The threshold value is determined based on whether all reference images in the current image are all in one direction. If all the reference images of the current images of the current image are from one direction, the threshold value is set at 12 x 2 ¹⁴ ~ ^d , otherwise it is set at 12 x 2 ¹³ ~ ^d .
[0109] Gradients for BIO are calculated at the same time with motion compensation interpolation using operations compatible with HEVC motion compensation process (separable 2D FIR). the input for this separable 2D FIR is the same reference frame sample as for the fractional motion and position compensation process (fracX, fracY) according to the fractional part of the block motion vector. In the case of dl / ÕX horizontal gradient signal first interpolated vertically using BlOfilterS corresponding to the fractional fracY position with non-scaling offset d-8, then gradient filter BIOfilterG is applied in the horizontal direction corresponding to the fractional fracX position with non-scaling offset by 18 -d. In the case of dl / dy vertical gradient the first gradient filter is applied vertically using BIOfilterG corresponding to the fractional fractional position with non-scaling offset d-8, then the signal displacement is performed using BlOfilterS in the horizontal direction corresponding to the fractional fractional position with displacement of not scaling by 18-d. The filter length of
Petition 870190089120, of 09/09/2019, p. 47/120
41/75 interpellation for calculating BIOfilterG gradients and BIOfilterF signal shift is shorter (6-touch) to maintain reasonable complexity. Table 1 shows the filters used to calculate gradients for different fractional positions of the block motion vector in BIO. Table 2 shows the interpellation filters used to generate the prediction signal in BIO.
[0110] Figure 7 shows an example of the gradient calculation for an 8x4 block. For an 8x4 block, you need to look for compensated motion predictors (also referred to as MC predictors) and calculate the HOR / VER gradients of all pixels in the current block as well as the two outer pixel lines because solving vx and vy for each pixel needs the HOR / VER gradient values and predictors of pixel compensated movement in the window Ω centered on each pixel as shown in equation (4). In JEM, the size of this window is set at 5x5, meaning that a video encoder needs, therefore, to look for compensated motion predictors and calculate the gradients for the two outer lines of pixels.
Table 1: Filters for calculating BIO gradients
Petition 870190089120, of 09/09/2019, p. 48/120
42/75
Fractional position by Gradient interpolation filter (BIOfilterG) 08, -39, -3, 46, -17, 5} 1/16 (í 8, -3.2, -13, 50, -18, 5} 1/87, -27, -20, 54, -19, 5} 3/166, -21, -29, 57, -18, 5) li4, -17, -36, 60, -15, 4] 5/16 í-9, -44, 61, -15, 4} 3/81, -4, -48, 61, -13, 3} 7/160, 1, -54, 60, -9, £ 2 1/21, 4, -57, 57, -4, 1}
Table 2: Interpolation filters for generating BIO prediction signals
| Fractional position by Gradient interpolation filter (BIOfilterG)⁰ < 8, -39, -3, 46, -17, 5} | 1/168, -32, -13, 50, -18, 5} 1/8 < 7, -27, -20, 54, -19, 5} | 3/166, -21, -29, 57, -18, 5} Saw i 4, -17, -36, 60, -15, 4} | 5/16 { 3 -9, -44, 61, -15, 4} 3/8 { 1, -4, -48, 61, -13, 3} 7/16 í 0, I, -54, 60, -9, 2} | 1/2 X 1, 4, -57, 57, -4, 1}
[0111] In JEM, BIO is applied to all bidirectional predicted blocks when the two predictions are from different reference images. When LIC is enabled for a CU, BIO is disabled.
[0112] In ^the fifth JVET meeting a JVETE0028 proposal Alshin A. E. Alshina, EE3: bi-directional optical
Petition 870190089120, of 09/09/2019, p. 49/120
43/75 flow w / o block extension, JVET-E0028, January 2017, fol submitted to modify BIO operations and reduce memory access bandwidth. In this proposal, no MC predictor and gradient value is needed for pixels outside the current block. In addition, the resolution of v _x and ev _y for each pixel is modified using the MC predictors and the gradient values of all pixels in the current block as shown in figure 7. In other words, the square window Ω in equation (4) is modified to a window that is equal to the CU current. In addition, a weighting factor w (i ', j') is considered to derive vx and vy. 0 w (i ', j') is a function of the position of the central pixel (i, j) and the positions of the pixels (i ', j') in the window.
V + η, / 1 ^ / 0 ^^ / ^ + τ _α
A = y /) (η + w (i ', ~ 2' ^ / 1 ^[í} / ôy + -r _b ô! ^ / Ây] (10) [0113] Figure 8 shows an example of a modified BIO for 8x4 block proposed in JVET-E0028. A simplified version of JVET-E0028 has been proposed to address the problem of mismatch in results between block-level and sub-block level BIO processes, instead of using neighborhood ο com with all pixels in CU, the proposed method modifies ο Ω neighborhood to include only 5x5 pixels centered on the current pixel without any interpellation or gradient calculation for pixel locations outside the current CPU.
[0114] Movement Compensation Aspects
Petition 870190089120, of 09/09/2019, p. 50/120
44/75 overlapping block (OBMC) in JEM will now be described. OBMC was used for premature generations of video standards, for example, as in H.263. In JEM, OBMC is performed for all compensated motion block (MC) limits except the right and bottom limits of a CU. In addition, it is applied to both luma and chroma components. In JEM, an MC block is corresponding to an encoding block. When a CU is encoded with sub-CU mode (includes sub-CU fusion, Affine, and FRUC mode), each CU sub-block is an MC block. To process CU limits in a uniform mode, OBMC is performed at the sub-block level for all MC block limits, where the sub-block size is defined to equal 4x4, as shown in figure 9.
[0115] When OBMC applies to the current sub-block, in addition to current motion vectors, the motion vectors of four connected neighboring sub-blocks, if available and are not identical to the current motion vector, are also used to derive block prediction for the current sub-block. These multiple prediction blocks based on multiple motion vectors are combined to generate the final prediction signal for the current sub-block.
[0116] As shown in figure 10, the prediction block based on one or more motion vectors of a neighboring sub-block is indicated as P _N , with N indicating an index for the neighboring sub-blocks above, below, on the left or to the right and the prediction block based on motion vectors of the current sub-block is indicated as P _c . When P _N is based on the movement information of a neighboring sub-block that contains the same movement information for the current sub-block, OBMC is not executed from P _N. In
Petition 870190089120, of 09/09/2019, p. 51/120
45/75 otherwise, every pixel of P _N is added to the same pixel in P _c , that is, four rows / columns of P _N are added to Pc. Weighting factors of {1/4, 1/8, 1/16, 1/32} are used for P _N , and weighting factors of {3/4, 7/8, 15/16, 31/32 } are used for P _c . The exception is small MC blocks (that is, when the height or width of the coding block is equal to 4 or a CU is coded with the sub-CU mode), to which only two rows / columns of P _N are added to P _c . in this case, weighting factors of {1/4, 1/8} are used for P _N and weighting factors {3/4, 7/8} are used for P _c . For P _N generated based on one or more vertical (horizontal) neighboring sub-block vectors, pixels on the same line (column) of P _N are added to P _c with the same weighting factor. It is observed that BIO is also applied for the derivation of the Pn prediction block.
[0117] In JEM, for a CU with size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether OBMC is applied or not for the current CU. For CUs larger than 256 luma samples or not encoded with the AMVP mode, OBMC is applied by default. In video encoder 20, when OBMC is applied to a CU, its impact is considered during the motion estimation stage. The prediction signal for using motion information from the top neighboring block and the left neighboring block is used to compensate for the top and left limits of the original current CU signal, and then the normal motion estimation process is applied.
[0118] BIO can be considered as a further processing of the CU level or sub-block level MC. While existing BIO implementations offer
Petition 870190089120, of 09/09/2019, p. 52/120
46/75 some coding performance improvements, existing implementations also present complexity problems for both software and hardware designs.
[0119] Figure 11 shows a poem diagram of the existing BEM design in JEM 5. In figure 11, MC 202 performs bi predictive motion compensation for a block using two motion vectors (MVO and MV1) and two reference images (RefO and Refl). MC 202 transmits two predictive blocks (P0 and Pl predictors) to BIO 204 which performs a BIO process on both predictors to generate P output, which corresponds to a P0 / P1 bi-average with BIO offsets added on a per pixel basis. OBMC 206 performs OBMC in P to produce two updated predictive blocks (P0 'and Pl'). BIO 208 then performs a BIO process on the two updated predictors to generate output P '', which is the final predictor.
[0120] In the example of figure 11, bi predictive motion compensation is followed by BIO filtration for regular MC and OBMC, and consequently, BIO processes are invoked multiple times for the same sub-block. This prolongs the overall motion compensation process as well as requiring extra bandwidth introduced by BIO on top of OBMC. Existing BIO implementations use split operations to calculate refined motion vectors, and pixel-based split operations are expensive in hardware design because, typically, multiple copies of splitters are required to obtain sufficient transmission capacity, resulting in high demand for silicon area. Regarding motion estimation, BIO is a process of refining MV in a small range of
Petition 870190089120, of 09/09/2019, p. 53/120
47/75 search for movement. Existing BIO implementations update CM predictors as a result. However, the motion vectors stored in the MV buffer are therefore not updated after refinement, causing an asynchronous design between the MC predictors and the associated motion vectors. The motion vector refinement calculation currently employs 6-touch interpolation filters and gradient filters, which results in increased complexity.
[0121] This disclosure describes techniques that can address the problems described above with respect to known BIO implementations. The following techniques can be applied individually, or alternatively, in any combination.
[0122] According to a technique of this disclosure, video encoder 20 and video decoder 30 can implement a block-based BIO scheme designed so that a group of pixels is used to generate a single motion vector refinement for all pixels in the group. The block size can be a predefined size including, but not limited to, 2x2 and 4x4.
[0123] Video encoder 20 and video decoder 30 can selectively adapt the block size. For example, video encoder 20 and video decoder 30 can select the block size based on the resolution of the frame being encoded, the size of the entire CU, the time layer of the current image, the QP used to encode the current image and / or the current CU encoding mode.
[0124] The video encoder 20 and the
Petition 870190089120, of 09/09/2019, p. 54/120
48/75 video decoder 30 can solve equation 4 for a square window Ω, which includes the block itself and a neighborhood of the block being considered. In one example, the size of Ω is 8x8 where the central 4x4 region contains the
pixel group in consideration to calculate the BIO shifts and the 2 pixel region in return and the block neighborhood • [0125] 0 video encoder 20 and O decoder video 30 can use a occupation in
weighting, which can take, but is not limited to, the form of Equation 10, to provide different weights for pixels from different locations in the window. In one example, pixels located in the central part of Ω are assigned higher weights than pixels located around the limit of Ω. The weighted average can be used to calculate the mediated value of terms in equation (7), to solve for v _x and v _y for the entire block. In some examples, a median filter can be applied to exclude outliers in the block before calculating the weighted average to obtain a more stable solution for equation 4. As an example, when a pixel is traversed as in figure 7 using a 5x5 window, in the applied weighting function, it can be assumed that all sample locations contribute 1 to the central sample of the window. A median can be applied, so that samples whose values are number of standard deviations (for example, 3) in the opposite direction to the median value of the current 5x5 samples are assigned a weight value of 0.
[0126] Figures 12A-12D show examples of 4x4 / 2x2 blocks with an extension of 1 or 2 pixels. On a
Petition 870190089120, of 09/09/2019, p. 55/120
49/75 example, the weighting function can be generated using an operation window as follows:
w (x, y) = k, x E [0, W - 1], y and [0, // - 1] (11)
Where Ω _(ΧλΥ) is the neighborhood (which shares the same size as the block length) of pixel location (x, y), B is the set of pixels where gradient values (for example, the 4x4 / 2x2 block) will be calculated, ek is a constant (predefined or signaled via Slice header / PPS / SPS).
[0127] Figures 12A-12D show examples of weighting functions. Figure 12A shows an example of a weighting function for a 4x4 block with an extension of 2 pixels. Figure 12B shows an example of a weighting function for a 4x4 block with a 1 pixel extension. Figure 12C shows an example of a weighting function for a 2x2 block with an extension of 2 pixels. Figure 12D shows an example of a weighting function for a 2x2 block with a 1 pixel extension.
[0128] Additionally, if information about whether a pixel belongs to an occluded object between RefO and Refl is available, neighboring pixels that belong to occluded objects can be assigned lighter weights. In one example, the pixel weights that belong to occluded objects are set to 0 and for other pixels, the weights remain unchanged. This allows pixel-level control over whether a specific pixel location is involved with BIO derivation. As an example of how to determine if a pixel is occluded, the difference between the current sample and the
Petition 870190089120, of 09/09/2019, p. 56/120
50/75 mediated sample of prediction of L0 and LI can be indicated Db; the difference between the current sample and the placed sample of L0 is indicated DO; and the placed sample of LI is D1, respectively. If Db / D0 >> 1 or Db / Dl >> 1, then a pixel can be identified as occluded.
[0129] The neighborhood range for BIO can be predefined. In some examples, the range can be signaled using SPS, PPS and the slice header. In some examples, the range can be made adaptable based on encoding information including, but not limited to, BIO block size, CU size, or frame resolution.
[0130] According to another technique of this disclosure, the video encoder 20 and the video decoder 30 can update the motion vector of a block after the BIO motion refinement. In this process, the video encoder 20 and video decoder 30 can refine the motion vector (or motion field) of a block by adding the displacement of motion information derived in ΒΙΟ. The update can take place after the regular MC process of the current block and refine the MV of the current block / CU before OBMC for subsequent block / CU, so that the updated MV is involved in the OBMC operation of the subsequent blocks / CU. In some instances, the update may occur after OBMC for subsequent CUs, so the updated motion vector is only used for motion vector prediction.
[0131] Video encoder 20 and video decoder 30 can apply the MV update in either AMVP mode, fusion mode, FRUC mode, or
Petition 870190089120, of 09/09/2019, p. 57/120
51/75 other inter prediction modes. In one example, the update to motion vector refinement occurs only for FRUC mode. In one example, the update to motion vector refinement only occurs for fusion mode. In one example, the update to motion vector refinement only occurs for AMVP mode. In an example, any combination of two or all of the above can be used.
[0132] In existing BIO implementations, fractional sample position gradient is based on the integer samples of the reference images and additional interpolation process in horizontal and / or vertical direction. To simplify the gradient calculation process, the gradient can be calculated based on the prediction samples that have already been challenged based on the existing MV of the current block / CU. The gradient calculation can be applied to the prediction samples at different stages during the generation of the prediction sample. For example, to generate the prediction samples for a bi-prediction block, you will first generate prediction samples L0 and prediction samples LI and then the prediction samples L0 and LI are weighted mediated to generate the bi-prediction samples. When OBMC is enabled, the generated bi-prediction samples are additionally weighted with the prediction samples using the neighboring VMs to generate the final prediction samples. In this example, the gradient calculation can be applied to L0, LI prediction samples independently; or the gradient calculation can only be applied to bi-prediction samples and final prediction samples with the assumption that L0 and LI predictors
Petition 870190089120, of 09/09/2019, p. 58/120
52/75 share the same gradient values. That is, instead of calculating the gradient values separately using RefOqRefl and added together during the derivation of displacements / motion vectors of ΒΙΟ, the gradient calculation on the bi-prediction samples can obtain the gradient values added in a single step .
[0133] In one implementation, video encoder 20 and video decoder 30 can apply a 2-touch gradient filter to the prediction samples to calculate the gradients. Let the current pixel position in a block be (x, y) and the MC predictor at that location is indicated by P (x, y). gradient values can be calculated by:
= ^ {P (min (x + í, VK - l), y) - P (max (x - 1,0), y)) * »S for xe [0, W-l]
G,. (X, y} = (P (x.mln (f - 1, y + 1)) - P (x, max (0, y - 1))) * K] »S ^{V 7} ye [ 0, H-1] (12) where K and S are scaling factors that can be predefined values, W indicates the block width and H indicates the block height. Note that the location (x, y) can be anywhere fractional-pel location after interpolation. In one example, values can be (24, 12, 8) or (26, 13, 8). These values can be signaled via SPS, PPS or Slice header.
[0134] In one example, video encoder 20 and video decoder 30 can apply a longer touch gradient filter to the prediction samples to calculate the gradients. For example, the filter with coefficients like {8, -39, -3, 46, -17, 5} can be
Petition 870190089120, of 09/09/2019, p. 59/120
53/75 applied. In some examples, the filter with filter coefficients {1, -5, 0, 5, -1}, or another symmetric filter is used. In some examples, the filter with coefficients {10, 44, 0, 44, -10, 0} is used.
[0135] According to another technique of this disclosure, video encoder 20 and video decoder 30 may not implement the BIO process in OBMC or only conditionally implement the BIO process in OBMC. BIO can use reference samples to generate the displacement or it can use the MC / OBMC predictors to generate the displacement. The generated BIO offset is added for MC predictors or OBMC predictors as a motion vector refinement.
[0136] Figures 13-18 show examples of simplified BIO designs according to the techniques of this disclosure. The techniques in figures 13-18 can be used in combination with, or as alternatives to the design shown in figure 11. In the examples in figures 13-18, the squares labeled MC, BIO and OBMC, generally perform the same functions as MC 202, BIO 204, OBMC, 206 and BIO 208 described above.
[0137] Figure 13 shows an example of a simplified BIO design according to the techniques of this disclosure. Figure 13 shows an example of BIO derived from RefO / Refl and applied to MC P0 / P1 predictors. The BIO process in OBMC is removed. BIO offsets are derived from MV0 / MV1, RefO / Refl, and MC P0 / P1 predictor, and offsets are added to P0 / P1 during Bi mean. The P 'predictor is the final predictor of the overall MC process. The dotted lines indicate the motion vector information in the figure and the
Petition 870190089120, of 09/09/2019, p. 60/120
54/75 solid lines indicate the effective pixel data for reference or prediction samples. In figure 13, the BIO after MC operation uses the MC P0 / P1 predictors together with the gradient values derived from RefO / Refl using MV0 / MV1 motion vectors to calculate the motion vector refinement and displacements. The output of the BIO P is generated by the average of P0 / P1 added by BIO shifts on a per pixel basis (even with BIO at the block level where the motion vector refinement remains the same in the block, the BIO shift can still be on a per pixel basis since the gradient values for each pixel can be different).
[0138] Figure 14 shows an example of a simplified BIO design according to the techniques of this disclosure. Figure 14 shows an example of BIO derived from RefO / Refl and applied to OBMC PO '/ P1' predictors, and the offsets are added to PO '/ Pl' during Bi mean. The predictor P '' is the final predictor of the overall MC process.
[0139] Figure 15 shows an example of a simplified BIO design according to the techniques of this disclosure. Figure 15 shows an example of BIO derived from / applied to predictors MC P0 / P1. Gradient values are calculated using MV0 / MV1 and RefO / Refl, and then generate the BIO offsets together with the MC P0 / P1 predictor. Offsets are added to the OBMC predictor P 'to generate the final predictor P' 'of the overall MC process.
[0140] Figure 16 shows an example of a simplified BIO design according to the techniques of this disclosure. Figure 16 shows an example of BIO derived
Petition 870190089120, of 09/09/2019, p. 61/120
55/75 of / applied to predictors of MC P0 / P1. BIO offsets are calculated using the MC P0 / P1 predictors, and the offsets are added to P0 / P1 during Bi mean, followed by an OBMC process to generate the final P '' predictor of the overall MC process.
[0141] Figure 17 shows an example of a simplified BIO design according to the techniques of this disclosure. Figure 17 shows an example of a simplified BIO using only the OBMC predictor. gradient values are derived using the OBMC PO '/ Pl' predictors and MV0 / MV1 motion vectors, and the BIO offsets are calculated using the OBMC PO '/ Pl' predictors. The displacements are added to PO '/ Pl' during Bi mean to generate the final P '' predictor of the general MC process.
[0142] In one example, the video encoder and video decoder 30 can conditionally disable the BIO in OBMC. Let MV _C ur _x θ MV _NBRx be the current block motion vectors and the neighboring block for Listx (where x is 0 or 1) during the OBMC process. in one example, if the absolute value of the motion vector difference between MCcurO and MV _NB rO, and the absolute value of the motion vector difference between MV _C ur1 and MV _NB r1 are both less than a threshold, the BIO in OBMC can be disabled. The threshold can be signaled via SPS / PPS / Slice header, or a predefined value (for example, half-pixel, a pixel or any value that is the same as the search range of the BIO motion vector refinement) can be used . In another example, if the absolute value of the motion vector difference between MV _NBR 0 and MV _nbr 1 is less than a threshold, BIO in OBMC can be
Petition 870190089120, of 09/09/2019, p. 62/120
56/75 disabled.
[0143] In one example, video encoder 20 and video decoder 30 can cover the number of BIO operations in the general BC process with a predetermined value. For example, the BIO process is performed at most N times (for example, N can be 1 or any positive integer) for each block (block can be CTU, CU, PU or an MxN block). In one example, the BIO is only allowed to run once for each block. When the prediction samples are generated using current motion information with BIO applied, no additional BIO is allowed to generate the other prediction samples for the current block such as OBMC or any other methods for refining the prediction samples. However, when the prediction samples are generated using current motion information with no BIO applied, at most one BIO is allowed for generations of the other prediction samples for the current block such as OBMC or any other method for refining the prediction samples.
[0144] According to techniques of this disclosure, video encoder 20 and video decoder 30 can implement a block-based design for BIO. Instead of pixel-level motion refinement in JEM5, motion refinement is done based on the 4x4 block. In block-based BIO the weighted sum of gradients for samples in a 4x4 block is used to derive BIO motion vector offsets for the block.
[0145] The other process, such as calculating gradients, BIO motion vectors and displacements, can, for example, follow the same procedure as done
Petition 870190089120, of 09/09/2019, p. 63/120
57/75 in the current JEM. After the MV 4x4 for each MV is obtained with ΒΙΟ block based, the MV buffer is updated and used for subsequent CU coding. The general block diagram is shown in figure 18, where the OBMC is applied without BIO operation.
[0146] Results of simulation in RA and LDB are shown in the following tables.
Random access p: • incipal 10 Above of JEM-5.0.1 Y u V EncT DecT Class A1 -0.1% -0.4% -0.3% 91% 90% Class A2 -0.1% -0.2% -0.3% 88% 84% Class B -0.1% -0.2% -0.1% 88% 83% Class C 0.1% -0.2% -0.2% 92% 85% Class D 0.3% -0.2% -0.2% 89% 84% Class E General (Ref) 0.0% -0.2% -0.2% 90% 85%
Low delay B Main Above of JEM-5.0.1 Y u V EncT DecT Class A1 Class, A2 Class g 0.0% 0.4% 0.1% 93% 89% Class C 0.1% 0.2% 0.2% 96% 91% Class- 0 0.0% 0.2% -0.5% 94% 90% Class £ -0.1% 0.6% 0.0% 96% 89% General (Ref) 0.0% 0.3% 0.0% 95% 90%
[0147] Figure 19 is a block diagram illustrating an example of a video encoder 20 that can implement techniques for bidirectional optical flow. The video encoder 20 can perform intra- and intercoding of video blocks into video slices. Intracoding relies on spatial prediction to reduce or remove spatial redundancy in video in a given frame or video image. Inter-coding is based on prediction
Petition 870190089120, of 09/09/2019, p. 64/120
58/75 temporal to reduce or remove temporal redundancy in video in adjacent frames or images of a video sequence. Intra-mode (I mode) can refer to any of several spatial-based coding modes. Inter modes, such as unidirectional prediction (P mode) or bi prediction (B mode), can refer to any of several time-based encoding modes.
[0148] As shown in figure 19, video encoder 20 receives a current video block in a video frame to be encoded. In the example of figure 19, video encoder 20 includes mode selection unit 40, reference image memory 64 (which can also be referred to as a decoded image buffer (DPB)), adder 50, transform processing unit 52, quantization unit 54 and entropy coding unit 56. The mode selection unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intraprediction unit 46 and division unit 48. For video block reconstruction, video encoder 20 also includes reverse quantization unit 58, reverse transform unit 60 and adder 62. An unlock filter (not shown in figure 19) can also be included to filter block boundaries for remove blocking artifacts from reconstructed video. If desired, the unlock filter would typically filter the output of adder 62. Additional filters (loop or post-loop) can also be used in addition to the unlock filter. Such filters are not shown for brevity, but if desired, they can filter adder output 62 (like a looped filter).
Petition 870190089120, of 09/09/2019, p. 65/120
59/75 [0149] During the encoding process, ο video encoder 20 receives a video frame or slice to be encoded. The frame or slice can be divided into multiple blocks of video. The motion estimation unit 42 and motion compensation unit 44 perform inter-predictive encoding of the received video block in relation to one or more blocks in one or more reference frames to provide time prediction. The intra-prediction unit 46 may alternatively intra-predict the received video block using pixels from one or more neighboring blocks in the same frame or slice as the block to be encoded to provide spatial prediction. The video encoder 20 can perform multiple encoding passes, for example, to select an appropriate encoding mode for each block of video data.
[0150] Furthermore, the division unit 48 can divide blocks of video data into sub-blocks, based on the evaluation of previous division schemes in previous coding passages. For example, division unit 48 can initially divide a frame or slice into LCUs, and divide each of the LCUs into sub-CUs based on rate distortion analysis (for example, rate distortion optimization). The mode selection unit 40 can additionally produce a quadtree data structure indicative of dividing an LCU into sub-CUs. Quadtree leaf node CUs can include one or more PUs and one or more TUs.
[0151] The mode selection unit 40 can select one of the prediction modes, intra or inter, for example, based on error results, and provides the block
Petition 870190089120, of 09/09/2019, p. 66/120
Resultant predicted 60/75 for adder 50 to generate residual data and for adder 62 to reconstruct the coded block for use as a frame of reference. The mode selection unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, division information and other such syntax information, for the entropy coding unit 56.
[0152] The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a PU from a video block in a reference frame (or other encoded unit) in relation to the current block being encoded in the current frame (or another encoded unit). A predictive block is a block that is found to closely match the block to be coded, in terms of pixel difference, which can be determined by the sum of absolute difference (SAD), sum of square difference (SSD) or other difference metrics. In some examples, video encoder 20 can calculate values for pixel sub-integer positions of reference images stored in reference image memory 64. For example, video encoder 20 can interpolate pixel position values from one fourth, eighth pixel positions, or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 can perform a motion search in
Petition 870190089120, of 09/09/2019, p. 67/120
61/75 relative to the total pixel positions and fractional pixel positions and transmit a vector of motion with fractional pixel precision.
[0153] Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-encoded slice by comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from a first reference image list (List 0) or a second reference image list (List 1), each of which identifies one or more reference images stored in reference image memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy coding unit 56 and motion compensation unit 44.
[0154] Motion compensation, performed by motion compensation unit 44, can involve fetching or generating the predictive block based on the motion vector determined by motion estimation unit 42. Again, motion estimation unit 42 and motion unit motion compensation 44 can be functionally integrated, in some examples. After receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference image lists. Adder 50 forms a residual video block by subtracting pixel values from the predictive block from the pixel values of the current video block being encoded, forming pixel difference values, as discussed below. In
Petition 870190089120, of 09/09/2019, p. 68/120
62/75 overall, motion estimation unit 42 performs motion estimation with respect to luma components, and motion compensation unit 44 uses motion vectors calculated based on luma components for both chroma and luma components. The mode selection unit 40 can also generate syntax elements associated with the video blocks and the video slice for use by the video decoder 30 in decoding the video blocks of the video slice.
[0155] In addition, the motion compensation unit 44 can be configured to perform any or all of the techniques of that disclosure (individually or in any combination). Although discussed with respect to motion compensation unit 44, it should be understood that the mode selection unit 40, motion estimation unit 42, division unit 48 and / or entropy coding unit 56 can also be configured to perform certain techniques of this disclosure, individually or in combination with the motion compensation unit 44. In one example, the motion compensation unit 44 can be configured to perform the BIO techniques discussed here.
[0156] The intra prediction unit 46 can intra predict a current block as an alternative to the intra prediction performed by the motion estimation unit 42 and motion compensation unit 44, as described above, in particular, the motion unit intra prediction 46 can determine an intra prediction mode to use to encode a current block. In some examples, the intra prediction unit 46 can encode a current block using
Petition 870190089120, of 09/09/2019, p. 69/120
63/75 various intra-prediction modes, for example, during separate encoding passes, and the intra-prediction unit 46 (or the mode selection unit 40, in some examples) can select an appropriate intra-prediction mode to use from the tested modes.
[0157] For example, intraprediction unit 46 can calculate rate distortion values using rate distortion analysis for the various tested intra prediction modes and select the intra prediction mode having the best rate distortion characteristics between the tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, uncoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the coded block. Intra-prediction unit 46 can calculate ratios from distortions and rates for the various encoded blocks to determine which intra-prediction mode has the best rate distortion value for the block.
[0158] After selecting an intraprediction mode for a block, the intraprediction unit 46 can provide information indicative of the intraprediction mode selected for the block for the entropy coding unit 56. The entropy coding unit 56 can encode the information indicating the selected intra-prediction mode. The video encoder 20 may include in the transmitted bit stream configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of
Petition 870190089120, of 09/09/2019, p. 70/120
64/75 modified intra-prediction mode (also referred to as code-word mapping tables), coding context definitions for multiple blocks and indications of a more likely intra-prediction mode, an intra-mode index table -prediction, and a modified intra-prediction mode index table to use for each of the contexts.
[0159] The video encoder 20 forms a residual video block by subtracting the prediction data from the mode selection unit 40 of the original video block being encoder. The adder 50 represents the component or components that perform this subtraction operation. The transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block including transform coefficient values. Wavelet transforms, integer transforms, subband transforms, discrete sine transforms (DSTs) or other types of transforms could be used instead of a DCT. In any case, the transform processing unit 52 applies the transform to the residual block, producing a block of transform coefficients. The transform can convert the residual information from a pixel domain to a transform domain, such as a frequency domain. The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the
Petition 870190089120, of 09/09/2019, p. 71/120
65/75 bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter.
[0160] After quantization, the entropy coding unit 56 entropy codes the quantized transform coefficients. For example, entropy coding unit 56 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), entropy coding of probability interval division (PIPE) or other entropy coding technique. In the case of context-based entropy coding, the context can be based on neighboring blocks. After entropy coding by entropy coding unit 56, the encoded bit stream can be transmitted to the other device (e.g., video decoder 30), or archived for later transmission or retrieval.
[0161] The inverse quantization unit 58 and the inverse transform processing unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain. In particular, adder 62 adds the reconstructed residual block to the compensated motion prediction block produced earlier by the motion compensation unit 44 or intra prediction unit 46 to produce a reconstructed video block for storage in the reference image memory 64. The block
Petition 870190089120, of 09/09/2019, p. 72/120
66/75 of reconstructed video can be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to intercode a block into a subsequent video frame.
[0162] Figure 20 is a block diagram illustrating an example video decoder 30 that can implement techniques for bidirectional optical flow. In the example in figure 20, video decoder 30 includes entropy decoding unit 70, motion compensation unit 72, intra prediction unit 74, inverse quantization unit 76, inverse transform unit 78, reference image memory 82 and adder 80. The video decoder 30 may, in some examples, perform a decoding pass in general reciprocal with the encoding pass described with respect to video encoder 20 (figure 19). The motion compensation unit 72 can generate prediction data based on motion vectors received from the entropy decoding unit 70, while the intraprediction unit 74 can generate prediction data based on intra prediction mode indicators received from the entropy decoding unit 70.
[0163] During the decoding process, the video decoder 30 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements from the video encoder 20. The decoding unit entropy 70 of the video decoder 30 entropy decodes the bit stream to generate quantized coefficients, motion vectors or mode indicators
Petition 870190089120, of 09/09/2019, p. 73/120
67/75 intra-prediction, and other elements of syntax. The entropy decoding unit 70 sends the motion vectors to and other syntax elements to the motion compensation unit 72. The video decoder 30 can receive the syntax elements at the video slice level and / or block level of video.
[0164] When the video slice is encoded as an intra-encoded slice (I), the intra prediction unit 74 can generate prediction data for a video block of the current video slice based on a signaled intra prediction mode and data from previously decoded blocks of the current frame or image. When the video frame is encoded as an intercoded slice (i.e., B, P or GPB), the motion compensation unit 72 produces predictive blocks for a video block of the current video slice based on the motion vectors and others syntax elements received from entropy decoding unit 70. Predictive blocks can be produced from one of the reference images in one of the reference image lists. The video decoder 30 can build the reference frame lists, List 0 and List 1, using default construction techniques based on reference images stored in reference image memory 82.
[0165] Motion compensation unit 72 determines prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the block of current video being decoded. Per
Petition 870190089120, of 09/09/2019, p. 74/120
68/75 example, motion compensation unit 72 uses some of the syntax elements received to determine a prediction mode (for example, intra- or inter-prediction) used to encode the video blocks of the video slice, a type inter-prediction slice (for example, slice B, slice P or GPB slice), construction information for one or more of the reference image lists for the slice, motion vectors for each slice's intercoded video block, and others information to decode the video blocks in the current video slice.
[0166] The motion compensation unit 72 can also perform interpolation based on interpolation filters for sub-pixel precision. The motion compensation unit 72 can use interpolation filters as used by the video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In that case, the motion compensation unit 72 can determine the interpolation filters used by the video encoder 20 from the received syntax elements and use the interpolation filters to produce predictive blocks.
[0167] In addition, the motion compensation unit 72 can be configured to perform any or all of the techniques of that disclosure (individually or in any combination). For example, the motion compensation unit 72 can be configured to perform the BIO techniques discussed in the present invention.
[0168] The inverse quantization unit 76 inverse quantizes, that is, it disquantifies, the coefficients of
Petition 870190089120, of 09/09/2019, p. 75/120
69/75 quantized transform provided in the bit stream and decoded by the entropy decoding unit 70. The reverse quantization process may include use of a quantization parameter QPy calculated by the video decoder 30 for each video block in the video slice for determine a degree of quantization and, similarly, a degree of inverse quantization that must be applied.
[0169] Inverse transform unit 78 applies an inverse transform, for example, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients to produce residual blocks in the pixel domain.
[0170] After the motion compensation unit 72 generates the predictive block for the current video block based on the motion vectors and other elements, the video decoder 30 forms a decoded video block by adding the residual blocks from the reverse transform unit 78 with the corresponding predictive blocks generated by the motion compensation unit 72. The adder 80 represents the component or components that perform this sum operation. If desired, an unlock filter can also be applied to filter decoded blocks to remove blocking artifacts. Other loop filters (in the encoding loop or after the encoding loop) can also be used to smooth out pixel transitions or otherwise improve video quality. The video blocks decoded in a given frame or image are then
Petition 870190089120, of 09/09/2019, p. 76/120
70/75 stored in reference image memory 82, which stores reference images used for subsequent motion compensation. The reference image memory 82 also stores decoded video for later display on a display device, such as display device 32 of figure 1. For example, reference image memory 82 can store decoded images.
[0171] Figure 21 is a flow chart illustrating an example video decoding technique described in this disclosure. The techniques of figure 21 will be described with reference to a generic video decoder, as, but not limited to, the video decoder 30. In some instances, the techniques of figure 21 can be performed by a video encoder like video encoder 20, in which case the generic video decoder corresponds to the decoder loop of the video encoder.
[0172] In the example of figure 21, the video decoder determines that a block of video data is encoded using a bidirectional inter-prediction mode (220). The video decoder determines that the video data block is encoded using a BIO process (222). The inter video decoder provides the video data block according to the bidirectional inter prediction mode (224). To predict the video data block, the video decoder can locate a first reference block in a first image, locate a second reference block in a second reference image and generate a first predictive block based on the first block of reference. reference and second reference block. The pixel group belongs to the first predictive block.
Petition 870190089120, of 09/09/2019, p. 77/120
71/75 [0173] The video decoder performs the BIO process for the block by determining a single motion vector refinement for a group of pixels in the block and refines the pixel group based on the single motion vector refinement ( 226). The pixel group includes at least two pixels. To perform the BIO process for the block, the video decoder can apply the BIO process to the pixel group of the first predictive block to generate the refined predictive block of ΒΙΟ. The pixel group can, for example, be a 4x4 block.
[0174] To refine the pixel group based on single motion vector refinement, the video decoder can, for example, apply the same refinement to all pixels in the group. To determine single motion vector refinement for the pixel group, the video decoder can determine a motion vector field for a pixel window that includes the pixel group and pixels in a region surrounding the pixel group. The window can, for example, be an 8x8 pixel block, a 6x6 pixel block, or some other size window. To determine the motion vector field for the pixel window, the video decoder can, for example, apply a first weight to a pixel adjacent to a window boundary and apply a second weight to a pixel not adjacent to any boundary of the window. window, with the second weight being greater than the first weight. To determine the motion vector field for the pixel window, the video decoder can apply a median filter for the pixel window.
[0175] The video decoder transmits a
Petition 870190089120, of 09/09/2019, p. 78/120
72/75 refined predictive block BIO of video data that includes the refined group of pixels (228). The refined predictive block BIO can be subjected to further processing, such as an OBMC process and / or one or more loop filters, before being transmitted. In instances where the video decoder is part of a video encoder, then the video decoder can transmit the refined BIO predictive block of video data by storing a decoded image including the refined BIO predictive block of video data in a buffer decoded image for use as a reference image in the encoding of subsequent images of video data. In instances where the video decoder is decoding the video data for display, then the video decoder can transmit the refined BIO predictive block of video data by storing a decoded image including the refined BIO predictive block of video data in a buffer decoded image for use as a reference image in decoding subsequent images of video data and for transmitting the decoded image including the refined predictive block BIO of video data, possibly after further processing such application of one or more loop filters, on a display device.
[0176] It must be recognized that depending on the example, certain acts or events of any of the techniques described here can be performed in a different sequence, can be added, merged or left out entirely (for example, not all the acts or events described necessary for the practice of techniques). In addition, in certain instances, acts or events
Petition 870190089120, of 09/09/2019, p. 79/120
73/75 can be executed simultaneously, for example, through multi-threaded processing, interrupt processing, or multiple processors instead of sequentially.
[0177] In one or more examples, the functions described can be implemented in hardware, software, firmware or any combination thereof. If implemented in software, functions can be stored in or transmitted over, as one or more instructions or code, on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media can include computer-readable storage media, which corresponds to tangible media such as data storage media, or communication media including any media that facilitates the transfer of a computer program from one place to another, for example, according to a communication protocol. Thus, computer-readable media can generally correspond to (1) computer-readable storage media that is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.
[0178] As an example, and not a limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disc storage,
Petition 870190089120, of 09/09/2019, p. 80/120
74/75 magnetic disk storage or other magnetic storage devices, flash memory, or any other media that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly called a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the media definition. It must be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transient media, but are instead directed to tangible, non-transitory storage media. Disk and disc, as used here, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc, and Blu-ray disc where disks typically reproduce data magnetically, while discs reproduce data ethically with lasers . Combinations of the above must also be included in the scope of computer-readable media.
[0179] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable port arrangements (FPGAs)
Petition 870190089120, of 09/09/2019, p. 81/120
75/75 or other discrete or integrated logic circuitry. Therefore, the term processor as used here can refer to any of the above structure or any other structure suitable for implementing the techniques described here. In addition, in some respects, the functionality described here can be provided in dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into a combined codec. Also, the techniques can be fully implemented in one or more circuits or logic elements.
[0180] The techniques of this disclosure can be implemented in a wide variety of devices or devices, including a wireless telephone device, an integrated circuit (TC) or a set of ICs (for example, a chip set). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the revealed techniques, but do not necessarily require realization by different hardware units. Instead, as described above, several units can be combined into one codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as described above, in combination with suitable software and / or firmware.
[0181] Several examples have been described here. These and other examples are included in the scope of the following claims.

权利要求:
Claims (9)
[1]
1. Method in decode data in video, O method comprising: to determine what a block of data in video is encoded using a mode bidirectional inter-prediction; to determine what the data block in video is
encoded using a bidirectional optical flow (BIO) process;
inter-predict the video data block according to the bidirectional inter-prediction mode;
running the BIO process for the block, where running the BIO process for the block comprises determining a single motion vector refinement for a group of pixels in the block and refining the pixel group based on the single motion vector refinement, where the pixel group comprises at least two pixels, and transmit a refined predictive block BIO of
Dice of video2 . comprising the group refined pixels. Method, according with the claim 1 in that the group of pixels understands a 4x4 block.3. Method according with the claim 1 in what refinement of the group in pixels based on
single motion vector refinement comprises applying the same refinement to all pixels in the group.
A method according to claim 1, wherein:
inter-predicting the video data block comprises locating a first reference block in a first image, locating a second reference block in a second reference image, and generating a first block
Petition 870190089120, of 09/09/2019, p. 83/120
[2]
2/9 predictive based on the first reference block and the second reference block, where the pixel group belongs to the first predictive block; and executing the BIO process for the block comprises applying the ΒΙΟ process to ο group of pixels of the first predictive block to generate the refined predictive block BIO.
A method according to claim 1, wherein determining the single motion vector refinement for the group of pixels comprises determining a motion vector field for a pixel window, wherein the pixel window comprises the group of pixels and pixels in a region surrounding the pixel group.
6. Method, in wake up with The claim 5, in what The j ring understands one block 8x8 in pixels. 7. Method, in wake up with The claim 5, in what The j ring understands one block 6x 6 in pixels. 8. Method, in wake up with The claim 5, in what The determination of field of vector of movement for The
pixel window comprises:
apply a first weight on a pixel adjacent to a window boundary; and apply a second weight to a pixel not adjacent to any window boundary, where the second weight is greater than the first weight.
9. The method of claim 5, wherein determining the motion vector field for the pixel window comprises:
apply a medium filter in the pixel window.
10. The method of claim 1, further comprising:
Petition 870190089120, of 09/09/2019, p. 84/120
[3]
3/9 apply an overlapping Block Movement Compensation (OBMC) process to the refined predictive block BIO.
11. Method according to claim 1, wherein the method for decoding video data is performed as part of a reconstruction loop of a video encoding process.
12. Device for decoding video data, the device comprising:
a memory configured to store video data; and one or more processors configured to:
to determine what one data block video is encoded using a mode in bidirectional inter-prediction; to determine what O data block video is
encoded using a bidirectional optical flow (BIO) process;
inter-predict the video data block according to the bidirectional inter-prediction mode;
execute the BIO process for the block, in which to execute the BIO process for the block, one or more processors are configured to determine a single motion vector refinement for a group of pixels in the block, where the group of pixels comprises at least minus two pixels and refine the pixel group based on single motion vector refinement, and transmit a refined BIO predictive block of video data comprising the refined group of pixels.
13. Device, in wake up with the claim 12, where the pixel group understands one 4x4 block. 14. Device, in wake up with the claim
Petition 870190089120, of 09/09/2019, p. 85/120
[4]
4/9
12, in which to refine the pixel group based on single motion vector refinement, one or more processors are configured to apply the same refinement to all pixels in the group.
15. Device according to claim
12, where:
to inter-predict the video data block, one or more processors are configured to locate a first reference block in a first image, locate a second reference block in a second reference image, and generate a first predictive block based on in the first reference block and the second reference block, in which the pixel group belongs to the first predictive block; and to perform the BIO process for the block, one or more processors are configured to apply the BIO process to the pixel group of the first predictive block to generate the refined BIO predictive block.
16. Device according to claim 12, in which to determine the single motion vector refinement for the group of pixels, one or more processors are configured to determine a motion vector field for a pixel window, wherein the pixel window comprises the group of pixels and pixels in
an region surrounding the group of pixels. 17. Device, from a deal with The claim 16, in which window understands an 8x8 block in pixels. 18. Device, from a deal with The claim 16, in which window understands a 6x6 block in pixels. 19. Device, from a deal with The claim
Petition 870190089120, of 09/09/2019, p. 86/120
[5]
5/9
16, in which to determine the motion vector field for the pixel window, one or more processors are configured to:
apply a first weight on a pixel adjacent to a window boundary; and apply a second weight to a pixel not adjacent to any window boundary, where the second weight is greater than the first weight.
20. Device according to claim 16, in which to determine the motion vector field for the pixel window, one or more processors are configured to:
apply a medium filter in the pixel window.
21. Device according to claim
12, where one or more processors are configured to:
apply an Overlapping Block Movement Compensation (OBMC) process to the refined predictive block BIO.
22. The device of claim 12, wherein the device comprises a wireless communication device, further comprising a receiver configured to receive encoded video data.
23. Device according to claim
22, in which the wireless communication device comprises a telephone apparatus and in which the receiver is configured to demodulate, according to a wireless communication standard, a signal comprising the encoded video data.
24. The device of claim 12, wherein the device comprises a wireless communication device, further comprising a transmitter configured to transmit encoded video data.
Petition 870190089120, of 09/09/2019, p. 87/120
[6]
6/9
25. Device, in according to claim 24, where the device Communication without wire comprises a device from phone and on what O transmitter is configured to modulate, in wake up with a pattern in Communication wireless, a signal understanding the data in
encoded video.
26. Apparatus for decoding video data, the apparatus comprising:
means for determining that a video data block is encoded using a bidirectional inter-prediction mode;
means for determining that the video data block is encoded using a bidirectional optical flow (BIO) process;
means for inter-predicting the video data block according to the bidirectional inter-prediction mode;
means for performing the BIO process for the block, wherein the means for performing the BIO process for the block comprises means for determining a single motion vector refinement for a group of pixels in the block and means for refining the group of pixels based on in single motion vector refinement, wherein the group of pixels comprises at least two pixels, and means for transmitting a refined predictive block BIO of video data comprising the refined group of pixels.
27. The apparatus of claim 26, wherein the means for refining the group of pixels based on the single motion vector refinement comprises means for applying the same refinement to all pixels in the
Petition 870190089120, of 09/09/2019, p. 88/120
[7]
7/9 group.
An apparatus according to claim 26, wherein:
the means for inter-predicting the video data block comprises means for locating a first reference block in a first image, means for locating a second reference block in a second reference image, and means for generating a first predictive block with based on the first reference block and the second reference block, where the pixel group belongs to the first predictive block; and the means for performing the BIO process for the block comprises means for applying the BIO process to the pixel group of the first predictive block to generate the refined BIO predictive block.
An apparatus according to claim 26, wherein the means for determining the single motion vector refinement for the group of pixels comprises means for determining a motion vector field for a pixel window, wherein the pixels comprises the group of pixels and pixels in a region surrounding the group of pixels.
30. Computer-readable storage media that stores instructions that, when executed by one or more processors, cause one or more processors to:
determine that a block of video data is encoded using a bidirectional inter-prediction mode;
determine that the video data block be encoded using a bidirectional optical flow (BIO) process;
Petition 870190089120, of 09/09/2019, p. 89/120
[8]
8/9 inter-predict the video data block according to the bidirectional inter-prediction mode;
execute the BIO process for the block, in which to execute the BIO process for the block, the instructions cause one or more processors to determine a single motion vector refinement for a group of pixels in the block and refine the group of pixels with based on single motion vector refinement, where the group of pixels comprises at least two pixels, and transmit a refined predictive block of video data comprising the refined group of pixels.
31. Computer-readable storage media according to claim 30, in which to refine the pixel group based on single motion vector refinement, the instructions cause one or more processors to apply the same refinement to all pixels in the group.
32. Computer-readable storage media according to claim 30, in which:
to inter-predict the video data block, the instructions cause one or more processors to locate a first reference block in a first image, locate a second reference block in a second reference image, and generate a first predictive block based on the first reference block and the second reference block, where the pixel group belongs to the first predictive block; and to execute the BIO process for the block, the instructions cause one or more processors to apply the BIO process to the pixel group of the first block
Petition 870190089120, of 09/09/2019, p. 90/120
[9]
Predictive 9/9 to generate the refined predictive block BIO.
33. Computer-readable storage media according to claim 30, in which to determine the single motion vector refinement for the group of pixels, the instructions cause one or more processors to determine a motion vector field for a pixel window, where the pixel window comprises the group of pixels and pixels in a region surrounding the group of pixels.

类似技术:

公开号 | 公开日 | 专利标题

BR112019018689A2|2020-04-07|inter-prediction refinement based on bi-directional | optical flow

BR112019013684A2|2020-01-28|motion vector reconstructions for bi-directional | optical flow

JP6740243B2|2020-08-12|Motion vector derivation in video coding

BR112019026775A2|2020-06-30|Effective design for memory bandwidth for bidirectional optical | streaming

KR20190055104A|2019-05-22|Improvements to frame rate up-conversion coding mode

JP2018536320A|2018-12-06|Improved bi-directional optical flow for video coding

JP2017513332A|2017-05-25|Use the current picture as a reference for video coding

BR112019017252A2|2020-04-14|deriving motion vector information in a video decoder

WO2018200960A1|2018-11-01|Gradient based matching for motion search and derivation

JP5805849B2|2015-11-10|Motion vector prediction in video coding.

JP2018521539A|2018-08-02|Search range determination for intercoding within a specific picture of video data

JP2022504073A|2022-01-13|Improved history-based motion vector predictors

BR112019019210A2|2020-04-14|restriction motion vector information derived by decoder side motion vector derivation

KR102331683B1|2021-11-25|Block-based advanced residual prediction for 3d video coding

BR112019027821A2|2020-07-07|template pairing based on partial reconstruction for motion vector derivation

BR112020006875A2|2020-10-06|low complexity project for fruc

JP2016530848A|2016-09-29|Video coding techniques using asymmetric motion splitting

BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE

BR112020021263A2|2021-01-26|mvp derivation limitation based on decoder side motion vector derivation

BR112021009732A2|2021-08-17|spatiotemporal motion vector prediction patterns for video encoding

BR112020025982A2|2021-03-23|subpredictable unit movement vector predictor signaling

同族专利:

公开号 | 公开日

AU2018236214A1|2019-08-22|

CN110352598B|2021-10-26|

WO2018169989A1|2018-09-20|

EP3596921A1|2020-01-22|

CN110352598A|2019-10-18|

US20180262773A1|2018-09-13|

US10523964B2|2019-12-31|

SG11201907089UA|2019-09-27|

KR20190126133A|2019-11-08|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US363399A|1887-05-24|andeal kilmeb |

US10015515B2|2013-06-21|2018-07-03|Qualcomm Incorporated|Intra prediction from a predictive block|

EP3332551A4|2015-09-02|2019-01-16|MediaTek Inc.|Method and apparatus of motion compensation for video coding based on bi prediction optical flow techniques|

US20180192071A1|2017-01-05|2018-07-05|Mediatek Inc.|Decoder-side motion vector restoration for video coding|US10750203B2|2016-12-22|2020-08-18|Mediatek Inc.|Method and apparatus of adaptive bi-prediction for video coding|

JP6860676B2|2017-01-04|2021-04-21|サムスンエレクトロニクスカンパニーリミテッド|Video decoding method and its device, and video coding method and its device|

US11272193B2|2017-04-24|2022-03-08|Sk Telecom Co., Ltd.|Method and apparatus for estimating optical flow for motion compensation|

JP2018191136A|2017-05-02|2018-11-29|キヤノン株式会社|Encoding device, encoding method and program|

US10841610B2|2017-10-23|2020-11-17|Avago Technologies International Sales Pte. Limited|Block size dependent interpolation filter selection and mapping|

WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |

EP3804324A1|2018-06-11|2021-04-14|Mediatek Inc.|Method and apparatus of bi-directional optical flow for video coding|

GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|

US10904555B2|2018-07-11|2021-01-26|Tencent America LLC|Method and apparatus for video coding|

BR112021001890A2|2018-09-14|2021-04-27|Panasonic Intellectual Property Corporation Of America|encoder, decoder, encoding method and decoding method|

JP2022500909A|2018-09-19|2022-01-04|北京字節跳動網絡技術有限公司Beijing Bytedance Network Technology Co., Ltd.|Use of syntax for affine mode with adaptive motion vector resolution|

WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|

EP3878178A1|2018-11-05|2021-09-15|InterDigital VC Holdings, Inc.|Video encoding or decoding using block extension for overlapped block motion compensation|

CN112956202A|2018-11-06|2021-06-11|北京字节跳动网络技术有限公司|Extension of inter prediction with geometric partitioning|

CN113170097A|2018-11-20|2021-07-23|北京字节跳动网络技术有限公司|Coding and decoding of video coding and decoding modes|

WO2020141816A1|2018-12-31|2020-07-09|한국전자통신연구원|Image encoding/decoding method and device, and recording medium in which bitstream is stored|

KR20210094664A|2019-01-02|2021-07-29|텔레폰악티에볼라겟엘엠에릭슨|Side-motion refinement in video encoding/decoding systems|

WO2020166556A1|2019-02-12|2020-08-20|Sharp Kabushiki Kaisha|Systems and methods for performing inter prediction in video coding|

WO2020164580A1|2019-02-14|2020-08-20|Beijing Bytedance Network Technology Co., Ltd.|Size selective application of decoder side refining tools|

CN113632484A|2019-03-15|2021-11-09|北京达佳互联信息技术有限公司|Method and apparatus for bit width control of bi-directional optical flow|

CN113615197A|2019-03-26|2021-11-05|北京达佳互联信息技术有限公司|Method and apparatus for bit depth control of bi-directional optical flow|

CN113661708A|2019-04-02|2021-11-16|北京字节跳动网络技术有限公司|Video encoding and decoding based on bidirectional optical flow|

EP3949419A1|2019-04-12|2022-02-09|MediaTek Inc|Method and apparatus of simplified affine subblock process for video coding system|

WO2020214564A1|2019-04-18|2020-10-22|Interdigital Vc Holdings, Inc.|Method and apparatus for video encoding and decoding with optical flow based on boundary smoothed motion compensation|

WO2020211866A1|2019-04-19|2020-10-22|Beijing Bytedance Network Technology Co., Ltd.|Applicability of prediction refinement with optical flow process|

EP3922015A1|2019-04-19|2021-12-15|Beijing Bytedance Network Technology Co. Ltd.|Gradient calculation in different motion vector refinements|

CN113728639A|2019-04-30|2021-11-30|北京达佳互联信息技术有限公司|Method and apparatus for optical flow prediction refinement|

EP3963881A1|2019-06-21|2022-03-09|Huawei Technologies Co., Ltd.|Early termination for optical flow refinment|

AU2020308351A1|2019-06-24|2022-02-03|Huawei Technologies Co., Ltd.|Method for computing position of integer grid reference sample for block level boundary sample gradient computation in bi-predictive optical flow computation and bi-predictive correction|

CN112135141A|2019-06-24|2020-12-25|华为技术有限公司|Video encoder, video decoder and corresponding methods|

US11272203B2|2019-07-23|2022-03-08|Tencent America LLC|Method and apparatus for video coding|

WO2021040426A1|2019-08-31|2021-03-04|엘지전자 주식회사|Image encoding/decoding method and device for performing prof, and method for transmitting bitstream|

WO2021054886A1|2019-09-20|2021-03-25|Telefonaktiebolaget Lm Ericsson |Methods of video encoding and/or decoding with bidirectional optical flow simplification on shift operations and related apparatus|

CN113596463A|2019-09-23|2021-11-02|杭州海康威视数字技术股份有限公司|Encoding and decoding method, device and equipment|

WO2021060834A1|2019-09-24|2021-04-01|엘지전자 주식회사|Method and device for subpicture-based image encoding/decoding, and method for transmitting bitstream|

WO2020256601A2|2019-10-03|2020-12-24|Huawei Technologies Co., Ltd.|Method and apparatus of picture-level signaling for bidirectional optical flow and decoder side motion vector refinement|

WO2021072326A1|2019-10-09|2021-04-15|Beijing Dajia Internet Information Technology Co., Ltd.|Methods and apparatuses for prediction refinement with optical flow, bi-directional optical flow, and decoder-side motion vector refinement|

CN111131837A|2019-12-30|2020-05-08|浙江大华技术股份有限公司|Motion compensation correction method, encoding method, encoder, and storage medium|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762470809P| true| 2017-03-13|2017-03-13|

US15/919,060|US10523964B2|2017-03-13|2018-03-12|Inter prediction refinement based on bi-directional optical flow |

PCT/US2018/022226|WO2018169989A1|2017-03-13|2018-03-13|Inter prediction refinement based on bi-directional optical flow |

[返回顶部]